R?RstudioR analyses in polished R markdown filesR resourcesRR installed
RR StudioRMarkdownRMarkdownR chunks into Rmarkdown documentsR Markdown FilesRRcode chunks in RMarkdown# symbols[]R follows the normal priority of mathematical evaluation (PEDMAS)Rinput code chunk and then output
## [1] 16
input code chunk and then output
## [1] 16
<- operator.R is case sensitive.## [1] 6
## [1] 4
These do not work
## [1] 14
## [1] 144
## [1] 2.484907
log - is a built in function of R, and therefore the object of the function needs to be put in parentheses## [1] 67
## [1] 69022864
c stands for concatenate## [1] "I Love"
## [1] "Biostatistics"
## [1] "I Love" "Biostatistics"
R thinks in terms of vectors
R user to try to write scripts with that in mind## [1] 2 3 4 2 1 2 4 5 10 8 9
## [1] 5 6 7 5 4 5 7 8 13 11 12
x is now what is called a list of character values.factors, and we can redefine our character variables as factors.R “sees” a variable using str() or class() functions.Many functions exist to operate on vectors.
sample) has an argument (replace=T)R and it is easy enough to write your own functions if none already exist to do what you want to do.seqsample## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3
## [15] 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7
## [29] 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1
## [43] 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5
## [57] 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
## [71] 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3
## [85] 8.4 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7
## [99] 9.8 9.9 10.0
## [1] 10.0 9.9 9.8 9.7 9.6 9.5 9.4 9.3 9.2 9.1 9.0 8.9 8.8 8.7
## [15] 8.6 8.5 8.4 8.3 8.2 8.1 8.0 7.9 7.8 7.7 7.6 7.5 7.4 7.3
## [29] 7.2 7.1 7.0 6.9 6.8 6.7 6.6 6.5 6.4 6.3 6.2 6.1 6.0 5.9
## [43] 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0 4.9 4.8 4.7 4.6 4.5
## [57] 4.4 4.3 4.2 4.1 4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1
## [71] 3.0 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.0 1.9 1.8 1.7
## [85] 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3
## [99] 0.2 0.1 0.0
## [1] 100.00 98.01 96.04 94.09 92.16 90.25 88.36 86.49 84.64 82.81
## [11] 81.00 79.21 77.44 75.69 73.96 72.25 70.56 68.89 67.24 65.61
## [21] 64.00 62.41 60.84 59.29 57.76 56.25 54.76 53.29 51.84 50.41
## [31] 49.00 47.61 46.24 44.89 43.56 42.25 40.96 39.69 38.44 37.21
## [41] 36.00 34.81 33.64 32.49 31.36 30.25 29.16 28.09 27.04 26.01
## [51] 25.00 24.01 23.04 22.09 21.16 20.25 19.36 18.49 17.64 16.81
## [61] 16.00 15.21 14.44 13.69 12.96 12.25 11.56 10.89 10.24 9.61
## [71] 9.00 8.41 7.84 7.29 6.76 6.25 5.76 5.29 4.84 4.41
## [81] 4.00 3.61 3.24 2.89 2.56 2.25 1.96 1.69 1.44 1.21
## [91] 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01
## [101] 0.00
## [1] 100.00 98.01 96.04 94.09 92.16 90.25 88.36 86.49 84.64 82.81
## [11] 81.00 79.21 77.44 75.69 73.96 72.25 70.56 68.89 67.24 65.61
## [21] 64.00 62.41 60.84 59.29 57.76 56.25 54.76 53.29 51.84 50.41
## [31] 49.00 47.61 46.24 44.89 43.56 42.25 40.96 39.69 38.44 37.21
## [41] 36.00 34.81 33.64 32.49 31.36 30.25 29.16 28.09 27.04 26.01
## [51] 25.00 24.01 23.04 22.09 21.16 20.25 19.36 18.49 17.64 16.81
## [61] 16.00 15.21 14.44 13.69 12.96 12.25 11.56 10.89 10.24 9.61
## [71] 9.00 8.41 7.84 7.29 6.76 6.25 5.76 5.29 4.84 4.41
## [81] 4.00 3.61 3.24 2.89 2.56 2.25 1.96 1.69 1.44 1.21
## [91] 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01
## [101] 0.00
dnorm() generates the probability density, which can be plotted using the curve() function.add=TRUEx <- rnorm(1000, 0, 100)
hist(x, xlim = c(-500, 500))
curve(50000 * dnorm(x, 0, 100), xlim = c(-500, 500), add = TRUE,
col = "Red")Rhist function.plot function (as well as a number of more sophisticated plotting functions).high level plotting function, which sets the stageLow level plotting functions will tweak the plots and make them beautifulseq_1 <- seq(0, 10, by = 0.1)
plot(seq_1, xlab = "space", ylab = "function of space", type = "p",
col = "red")par(mfrow = c(2, 2))
plot(seq_1, xlab = "time", ylab = "p in population 1", type = "p",
col = "red")
plot(seq_2, xlab = "time", ylab = "p in population 2", type = "p",
col = "green")
plot(seq_square, xlab = "time", ylab = "p2 in population 2",
type = "p", col = "blue")
plot(seq_square_new, xlab = "time", ylab = "p in population 1",
type = "l", col = "yellow")Complete Excercises 1.3-1.8
Rmydata <- data.frame(habitat, temp, elevation)
row.names(mydata) <- c("Reedy Lake", "Pearcadale", "Warneet",
"Cranbourne", "Lysterfield", "Red Hill", "Devilbend", "Olinda")R is being able to import data from an external sourceR.knit button to render markdown> "You know the greatest danger facing us is ourselves, an irrational fear of the unknown.
But there’s no such thing as the unknown — only things temporarily hidden, temporarily not understood."
>
> --- Captain James T. Kirk“You know the greatest danger facing us is ourselves, an irrational fear of the unknown. But there’s no such thing as the unknown — only things temporarily hidden, temporarily not understood.”
— Captain James T. Kirk
\[ \large a^x, \sqrt[n]{x}, \vec{\jmath}, \tilde{\imath}\]
\[ \large \alpha, \beta, \gamma\]
\[ \large\approx, \neq, \nsim \]
\[\large \partial, \mathbb{R}, \flat\]
Binomial sampling equation
\[\large f(k) = {n \choose k} p^{k} (1-p)^{n-k}\]
Poisson Sampling Equation
\[\large Pr(Y=r) = \frac{e^{-\mu}\mu^r}{r!}\]
\[\iint xy^2\,dx\,dy =\frac{1}{6}x^2y^3\]
$$ \begin{matrix}
-2 & 1 & 0 & 0 & \cdots & 0 \\
1 & -2 & 1 & 0 & \cdots & 0 \\
0 & 1 & -2 & 1 & \cdots & 0 \\
0 & 0 & 1 & -2 & \ddots & \vdots \\
\vdots & \vdots & \vdots & \ddots & \ddots & 1 \\
0 & 0 & 0 & \cdots & 1 & -2
\end{matrix} $$\[ \begin{matrix} -2 & 1 & 0 & 0 & \cdots & 0 \\ 1 & -2 & 1 & 0 & \cdots & 0 \\ 0 & 1 & -2 & 1 & \cdots & 0 \\ 0 & 0 & 1 & -2 & \ddots & \vdots \\ \vdots & \vdots & \vdots & \ddots & \ddots & 1 \\ 0 & 0 & 0 & \cdots & 1 & -2 \end{matrix} \]
This is an inline code chunk
This equation, \(y=\frac{1}{2}\), is included inline
Whereas this equation, \[y=\frac{1}{2}\], is put on a separate line
\(y=\frac{1}{2}\)
Say you perform an experiment on two different strains of stickleback fish, one from an ocean population (RS) and one from a freshwater lake (BP) by making them microbe free. Microbes in the gut are known to interact with the gut epithelium in ways that lead to a proper maturation of the immune system.
EXPERIMENTAL SETUP - You carry out an experiment by treating multiple fish from each strain so that some of them have a conventional microbiota, and some are inoculated with only one bacterial species. You then measure the levels of gene expression in the stickleback gut using RNA-seq. You suspect that the sex of the fish might be important so you track it too.
Hadley Wickham and others have written R packages to modify data
These packages do many of the same things as base functions in R
However, they are specifically designed to do them faster and more easily
Wickham also wrote the package GGPlot2 for elegant graphics creations
GG stands for ‘Grammar of Graphics’
int stands for integers
dbl stands for doubles, or real numbers
chr stands for character vectors, or strings
dttm stands for date-times (a date + a time)
lgl stands for logical, vectors that contain only TRUE or FALSE
fctr stands for factors, which R uses to represent categorical variables with fixed possible values
date stands for dates
FALSETRUENA which is ‘not available’.R numbers are doubles by default.NANaN which is ‘not a number’Inf-Infrep()dplyr for vectorsfilter().arrange().select().mutate().summarise().filter(), arrange() & select()mutate() & transmutate()This function will add a new variable that is a function of other variable(s)
This function will replace the old variable with the new variable
group_by( ) & summarize( )This first function allows you to aggregate data by values of categorical variables
Once you have done this aggregation, you can then calculate values (in this case the mean) of other variables split by the new aggregated levels of the categorical variable
xxx
xxx
xxx
xxx
xxx
xxx
xxx
xxx
xxx
“Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space”
— Edward Tufte
xxx
xxx
Draw graphical elements clearly, minimizing clutter
xxx
Represent magnitudes honestly and accurately
xxx
xxx
xxx
xxx
“Graphical excellence begins with telling the truth about the data” – Tufte 1983
“Graphical excellence consists of complex ideas communicated with clarity, precision and efficiency” – Tufte 1983
xxx
geom_bar functiongeom_bar functionNow try this…
geom_bar functionand this…
geom_bar functionand finally this…
geom_histogram and geom_freqpoly functionWith this function you can make a histogram
geom_histogram and geom_freqpoly functionThis allows you to make a frequency polygram
geom_boxplot functionBoxplots are very useful for visualizing data
geom_boxplot functiongeom_boxplot functionggplot(data = mpg, mapping = aes(x = reorder(class, hwy, FUN = median),
y = hwy)) + geom_boxplot() + coord_flip()geom_point & geom_smooth functionsgeom_point & geom_smooth functionsgeom_point & geom_smooth functionsgeom_point & geom_smooth functionsggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy), method = "loess")ggplot(data = mpg, aes(displ, hwy)) + geom_point(aes(color = class)) +
geom_smooth(se = FALSE, method = "loess") + labs(title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov")xxx